Learning Sight from Sound: Ambient Sound Provides Supervision for Visual Learning
نویسندگان
چکیده
The sound of crashing waves, the roar of fastmoving cars – sound conveys important information about the objects in our surroundings. In this work, we show that ambient sounds can be used as a supervisory signal for learning visual models. To demonstrate this, we train a convolutional neural network to predict a statistical summary of the sound associated with a video frame. We show that, through this process, the network learns a representation that conveys information about objects and scenes. We evaluate this representation on several recognition tasks, finding that its performance is comparable to that of other state-of-the-art unsupervised learning methods. Finally, we show through visualizations that the network learns units that are selective to objects that are often associated with characteristic sounds. This paper extends an earlier conference paper, Owens et al (2016b), with additional experiments and discussion.
منابع مشابه
Ambient Sound Provides Supervision for Visual Learning
The sound of crashing waves, the roar of fast-moving cars – sound conveys important information about the objects in our surroundings. In this work, we show that ambient sounds can be used as a supervisory signal for learning visual models. To demonstrate this, we train a convolutional neural network to predict a statistical summary of the sound associated with a video frame. We show that, thro...
متن کاملLearning to Localize Sound Source in Visual Scenes
Visual events are usually accompanied by sounds in our daily lives. We pose the question: Can the machine learn the correspondence between visual scene and the sound, and localize the sound source only by observing sound and visual scene pairs like human? In this paper, we propose a novel unsupervised algorithm to address the problem of localizing the sound source in visual scenes. A two-stream...
متن کاملA New Method to Improve Automated Classification of Heart Sound Signals: Filter Bank Learning in Convolutional Neural Networks
Introduction: Recent studies have acknowledged the potential of convolutional neural networks (CNNs) in distinguishing healthy and morbid samples by using heart sound analyses. Unfortunately the performance of CNNs is highly dependent on the filtering procedure which is applied to signal in their convolutional layer. The present study aimed to address this problem by a...
متن کاملVisual to Sound: Generating Natural Sound for Videos in the Wild
As two of the five traditional human senses (sight, hearing, taste, smell, and touch), vision and sound are basic sources through which humans understand the world. Often correlated during natural events, these two modalities combine to jointly affect human perception. In this paper, we pose the task of generating sound given visual input. Such capabilities could help enable applications in vir...
متن کاملL-Ball: Designing A Novel Sports Electronic Audio Ball for Visual Impairment Student
Background. A field study conducted by researchers found the balls used by visual impairment students at school basically used the sound by a small bell inside the ball. However, the sound emitted from the ball is very limited, the ball will sound when it is moved. This makes it difficult for students with visual impairments to find a missing ball that not emitted makes a sound. Objectives. A ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- CoRR
دوره abs/1712.07271 شماره
صفحات -
تاریخ انتشار 2017